46 research outputs found

    GMT: Enabling easy development and efficient execution of irregular applications on commodity clusters

    Get PDF
    In this poster we introduce GMT (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT only requires a cluster with x86 nodes supporting MPI. GMT integrates the Partititioned Global Address Space (PGAS) locality-aware global data model with a fork/join control model common in single node multithreaded environments. GMT supports lightweight software multithreading to tolerate latencies for accessing data on remote nodes, and is built around data aggregation to maximize network bandwidth utilization.Peer ReviewedPostprint (author's final draft

    High level synthesis of RDF queries for graph analytics

    Get PDF
    In this paper we present a set of techniques that enable the synthesis of efficient custom accelerators for memory intensive, irregular applications. To address the challenges of irregular applications (large memory footprint, unpredictable fine-grained data accesses, and high synchronization intensity), and exploit their opportunities (thread level parallelism, memory level parallelism), we propose a novel accelerator design that employs an adaptive and Distributed Controller (DC) architecture, and a Memory Interface Controller (MIC) that supports concurrent and atomic memory operations on a multi-ported/multi-banked shared memory. Among the multitude of algorithms that may benefit from our solution, we focus on the acceleration of graph analytics applications and, in particular, on the synthesis of SPARQL queries on Resource Description Framework (RDF) databases. We achieve this objective by incorporating the synthesis techniques into Bambu, an Open Source high-level synthesis tools, and interfacing it with GEMS, the Graph database Engine for Multithreaded Systems. The GEMS' front-end generates optimized C implementations of the input queries, modeled as graph pattern matching algorithms, which are then automatically synthesized by Bambu. We validate our approach by synthesizing several SPARQL queries from the Lehigh University Benchmark (LUBM)

    The Future is Big Graphs! A Community View on Graph Processing Systems

    Get PDF
    Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?Comment: 12 pages, 3 figures, collaboration between the large-scale systems and data management communities, work started at the Dagstuhl Seminar 19491 on Big Graph Processing Systems, to be published in the Communications of the AC

    Hardware Acceleration of Complex Machine Learning Models through Modern High-Level Synthesis

    No full text
    Machine learning algorithms continue to receive significant attention from industry and research. As the models increase in complexity and accuracy, their computational and memory demands also grow, pushing for more powerful, heterogeneous architectures; custom FPGA/ASIC accelerators are often the best solution to efficiently process large amounts of data close to the sensors in large-scale scientific experiments. Previous works exploited high-level synthesis to help design dedicated compute units for machine learning inference, proposing frameworks that translate high-level models into annotated C/C++. Our proposal, instead, integrates HLS in a compiler-based tool flow with multiple levels of abstraction, enabling analysis, optimization and design space exploration along the whole process. Such an approach will also allow to explore models beyond multi-layer perceptrons and convolutional neural networks (which are often the main target of "classic" HLS frameworks), for example to address the different challenges posed by sparse and graph-based neural networks

    Exploring efficient hardware support for applications with irregular memory patterns on multinode manycore architectures

    Get PDF
    With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available for analysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributed highperformance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach to system design that enable efficient execution of applications with irregular memory patterns on a distribute, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and transparently hide the latencies of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and help identifying the bottlenecks in the prototypes. The experimental evaluation on graph based applications demonstrates the scalability of the architecture for different configurations of the whole system
    corecore